A fast and memory-efficient N-gram language model lookup method for large vocabulary continuous speech recognition

نویسندگان

Xiaolong Li

Yunxin Zhao

چکیده

Recently, minimum perfect hashing (MPH)-based language model (LM) lookup methods have been proposed for fast access of N-gram LM scores in lexical-tree based LVCSR (large vocabulary continuous speech recognition) decoding. Methods of node-based LM cache and LM context pre-computing (LMCP) have also been proposed to combine with MPH for further reduction of LM lookup time. Although these methods are effective, LM lookup still takes a large share of overall decoding time when trigram LM lookahead (LMLA) is used for lower word error rate than unigram or bigram LMLAs. Besides computation time, memory cost is also an important performance aspect of decoding systems. Most speedup methods for LM lookup obtain higher speed at the cost of increased memory demand, which makes system performance unpredictable when running on computers with smaller memory capacities. In this paper, an order-preserving LM context pre-computing (OPCP) method is proposed to achieve both fast speed and small memory cost in LM lookup. By reducing hashing operations through order-preserving access of LM scores, OPCP cuts down LM lookup time effectively. In the meantime, OPCP significantly reduces memory cost because of reduced size of hashing keys and the need for only last word index of each N-gram in LM storage. Experimental results are reported on two LVCSR tasks (Wall Street Journal 20K and Switchboard 33K) with three sizes of trigram LMs (small, medium, large). In comparison with abovementioned existing methods, OPCP reduced LM lookup time from about 30–80% of total decoding time to about 8– 14%, without any increase of word error rate. Except for the small LM, the total memory cost of OPCP for LM lookup and storage was about the same or less than the original N-gram LM storage, much less than the compared methods. The time and memory savings in LM lookup by using OPCP became more pronounced with the increase of LM size. 2005 Elsevier Ltd. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

Lemmatized Latent Semantic Model for Language Model Adaptation of Highly Inflected Languages

We present a method to adapt statistical N-gram models for large vocabulary continuous speech recognition of highly inflected languages. The method combines morphological analysis, latent semantic analysis (LSA) and fast marginal adaptation for building topic-adapted trigram models, based on a background language model and very short adaptation texts. We compare words, lemmas and morphemes as b...

متن کامل

Generalized Fast On-the-fly Com WFST-Based Speech R

This paper describes a Generalized Fast On-the-fly Composition (GFOC) algorithm for Weighted Finite-State Transducers (WFSTs) in speech recognition. We already proposed the original version of GFOC, which yields fast and memory-efficient decoding using two WFSTs. GFOC enables fast on-the-fly composition of three or more WFSTs during decoding. In many cases, it is actually difficult or impossibl...

متن کامل

PocketSUMMIT: small-footprint continuous speech recognition

We present PocketSUMMIT, a small-footprint version of our SUMMIT continuous speech recognition system. With portable devices becoming smaller and more powerful, speech is increasingly becoming an important input modality on these devices. PocketSUMMIT is implemented as a variable-rate continuous density hidden Markov model with diphone context-dependent models. We explore various Gaussian param...

متن کامل

W - a Fast , Memory Efficient One - Pass Stack Decoder

This paper describes features and implementation details of the $N$>$_ decoder, a fast, memory ecient one-pass stack decoder designed for large vocabulary speech recognition with dictionaries 65536 words. The stack decoder design made it possible to use arbitrary backo N-gram language models in the rst pass. A new on-demand N-gram LM-lookahead for the tree lexicon is introduced. Decoding time w...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Computer Speech & Language

دوره 21 شماره

صفحات -

تاریخ انتشار 2007

A fast and memory-efficient N-gram language model lookup method for large vocabulary continuous speech recognition

نویسندگان

چکیده

منابع مشابه

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Lemmatized Latent Semantic Model for Language Model Adaptation of Highly Inflected Languages

Generalized Fast On-the-fly Com WFST-Based Speech R

PocketSUMMIT: small-footprint continuous speech recognition

W - a Fast , Memory Efficient One - Pass Stack Decoder

عنوان ژورنال:

اشتراک گذاری